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Abstract 

Background: Thioredoxin reductase 1 (TXNRD1) and thioredoxin interacting 
protein (TXNIP) also known as thioredoxin binding protein 2 or vitamin D3- 
upregulated protein 1 are key players in oxidative stress control. Thioredoxin 
(TRX) is one of the major components of the thiol reducing system and plays 
multiple roles in cellular processes. Computational analyses of TXNRDl, TXNIP 
and TRX expressions have not been analyzed in relation to prognosis of breast 
cancer. High expression of TXNRDl and low expression of TXNIP are associ- 
ated with worst prognosis in breast cancer. 

Methods: Using bioinformatics applications we studied sequence analysis, mo- 
lecular modeling, template and fold recognition, docking and scoring of thio- 
redoxin as a target. 

Results: The resultant model obtained was validated based on the templates 
from l-TASSER server and binding site residues were predicted. The predicted 
model was used for Threading and Fold recognition and was optimized using 
GROMACS. The generated model was validated using programs such as 
Procheck, Ramachandran plot, verify-3d and Errat value from Saves server, 
and the results show that the model is reliable. Next we obtained small mo- 
lecules from pubchem and chembank which are databases for selecting suit- 
able ligands for our modeled target. These molecules were screened for dock- 
ing, using GOLD and scoring was obtained using Chemscore as a scoring func- 
tion. 

Conclusion: This study predicted the ligand interaction of four molecules with 
the minimized protein modeled structure and the best ligand with top scores 
from about 500 molecules screened. These were 3-hydroxy-2,3-diphenylbut- 
anoic acid, 4-amino-3-pentadecylphenol, 3-(hydroxyimino)-2,4-diphenylbut- 
anenitrile and 2-ethyl-1,2-diphenylbutyl carbamate, which are proposed as 
possible hit molecules for the drug discovery and development process. 
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Introduction 

Recent research has shown the importance 
of reduction/oxidation (redox) regulation in 
various biological phenomena. Thioredoxin 
(TRX) is one of the major components of the 
thiol reducing system and plays multiple roles 



in cellular processes such as proliferation, 
apoptosis, and gene expression Reactive 
Oxygen Species (ROS) and the cellular thiol 
redox state are crucial mediators of multiple 
cell processes like growth, differentiation and 
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apoptosis (2) . 

Increased levels of thioredoxin occur in a 
number of human cancers, which may contri- 
bute to the resistance of cancers to therapy by 
scavenging ROS that are generated by various 
anti-cancer agents (3) . Breast cancer is a kind 
of malignant tumor that occurs when cells in 
the breast becomes so over-active that they 
won't stop multiplying (4) . Using bioinform- 
atics approaches we have analyzed the se- 
quences of the TXNIP in order to develop a 
thioredoxin model which can be used as suit- 
able target for determining novel lead molecu- 
les for breast cancer. 

Members of the TRX system regulate apop- 
tosis through a wide variety of mechanisms. 
A family of thioredoxin-dependent peroxid- 
ases (peroxiredoxins) protects against apop- 
tosis by scavenging hydrogen peroxide. Thio- 
redoxin- 1 (Trx-1) is a small redox protein that 
is over-expressed in many human tumors, 
where it is associated with aggressive tumor 
growth and decreased patient survival. Trx-1 
is secreted by tumor cells and is present at in- 
creased levels in the plasma of cancer pa- 
tients. It is reported that Thioredoxin 1 (Trx- 
1) and Thioredoxin 2 (Trx-2) have opposed 
regulatory functions on hypoxia-inducible 
factor- la Thioredoxin 2 is a critical regulator 
of cytochrome c release and mitochondrial 
apoptosis; transmembrane thioredoxin-related 
molecule (TMX) has a protective role in 
Endoplasmic Reticulum (ER) stress-induced 
apoptosis (5) . 

Thioredoxin is known to have important 
roles in the cellular responses and several 
studies implicate thioredoxin as a contributor 
to cancer progression. In cancers the tumor 
environment is usually under either oxidative 
or hypoxic stress and both stresses are known 
up-regulators of thioredoxin expression (6) . 

The Trx system is a ubiquitous thiol-re- 
ducing system that includes Trx, Trx-inter- 
acting protein (Txnip), Trx reductase (Trxr) 
and NADPH. Trx is a small (12 kDa) protein 
with a conserved active site Trp-Cys-Gly-Pro- 
Cys that plays an important defensive role 
against oxidative stress by scavenging intra- 



cellular ROS. Binding of ROS leads to Trx 
oxidation. Trxr in the presence of NADPH 
can convert oxidized Trx back to its reduced 
form. Trx proteins are represented in the cell 
by at least two forms; Trxl which is present 
in the cytoplasm and Trx2 which is localized 
in the mitochondria (7) . 

The major aim of our study was to analyze 
the Thioredoxin system as it is an important 
target in drug discovery studies for some dis- 
eases, but our aim was to use this system to 
identify cancer drugs from the drug-database; 
therefore computational methods were used to 
identify the possible inhibitors to thioredoxin. 

Materials and Methods 

Thioredoxin sequence analysis 

Thioredoxin sequence forms the basis of 
our study as this is a good target for cancer 
chemotherapy. We selected the gene se- 
quences from UniProt KB, which is common- 
ly used as knowledge base for molecular se- 
quences. Most of the sequences in UniProt 
KB are derived from the conceptual transla- 
tion of nucleotide sequences. It plays an im- 
portant role by providing a stable, comprehen- 
sive, freely accessible central resource on pro- 
tein sequences and functional annotation. We 
used computational analysis for the functional 
annotation for the gene sequences. 

BLAST program with PSI-BLAST specification 
with PDB 

Position Specific Iterative BLAST (PSI- 
BLAST) profile was generated from local 
alignments of the most highly scoring hits in 
the initial BLAST results by calculating pos- 
ition-specific scores for every position in the 
alignment in the sequences. Five template se- 
quences were generated based on this align- 
ment program. This iterative procedure in- 
creased the sensitivity of the BLAST search 
and helped us to identify new relationships 
between the query and database entries. Clust- 
al W was used for Multiple Sequence Align- 
ment Program (MSA) to determine the con- 
served sequences among the templates. 
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3-D model building 

As the PSI- BLAST similarity obtained was 
less than 30%, we preferred the fold recogni- 
tion method as the option to build 3-D model. 
Usually fold recognition methods are so effi- 
cient especially in the following cases: first, 
when the sequence has little or no primary se- 
quence similarity to any sequence with a 
known structure. Second, when some model 
from the structure library represents the true 
fold of the sequence. 

Our study falls into the first category in 
which we tried to recognize the structural fold 
of the target protein from a structure template 
library, given its sequence information and 
then generate an alignment between the query 
and the recognized template protein, from 
which the structure of query protein was pre- 
dicted. We used I-TASSER web server (8) 
which has generated five predicted 3D models 
for the requests. A scoring function (C-score) 
based on the relative clustering structural 
density and the consensus significance score 
of multiple threading templates were obtained 
to estimate the accuracy of the I-TASSER 
predictions. 

Scoring and validation of 3-D model 

The output of the I-TASSER server for our 
query protein included the prediction of sec- 
ondary structure, top five full-length models 
with confidence scores, the estimated TM- 
score, RMSD, standard deviation of the esti- 
mations and top ten templates. The binding 
site predicted by I-TASSER server suggests 
26 amino acids residues as the possible bind- 
ing site residues. 

Threading/fold recognition 

Modeller 8 was used to construct the model 
based on the generated templates for the fol- 
lowing PDB ids 1G4MA, 2WTRB, 2R51A, 
1CF1C, 2FAUA. All templates were taken 
based on their folds to construct the model. 

Functional characterization of a protein se- 
quence is a common goal in biology, and is 
usually facilitated by having an accurate 
three-dimensional (3-D) structure of the 
studied protein. In the absence of an experi- 



mentally determined structure, comparative or 
homology modeling can sometimes provide a 
useful 3-D model for a protein that is related 
to at least one known protein structure. Com- 
parative modeling predicts the 3-D structure 
of a given protein sequence (target) based pri- 
marily on its alignment to one or more pro- 
teins of known structure (templates). The pre- 
diction process consists of fold assignment, 
target-template alignment, model building, 
and model evaluation (9) . 

The following threading programs were 
used to collect the templates: 
1: MUSTER 
2: HHSEARCH 
3: SP3 

4: PROSPECT2 
5: PPA-I 

6: HHSEARCH I 
7: FUGUE 
8: SPARKS 

This model was optimized using GROMACS 
(10) which is a molecular dynamics package 
primarily designed for biomolecular systems 
such as proteins and lipids. The minimized 
model obtained was used for virtual screening 
in order to filter the compounds using GOLD 
score. 

Molecular screening 

The use of virtual screening to discover 
new inhibitors is becoming a common prac- 
tice in modern drug discovery 

Receptor-based virtual screens seek to 
"dock" members of a chemical library against 
a given protein structure, predicting the con- 
formation and binding affinity of the small 
molecules (12) . All possible compounds which 
obey rule of 5 were collected from PUB- 
CHEM, which is the main resource for ob- 
taining freely-available bioassay data provid- 
ed by the National Center for Biotechnology 
Information [NCBI] (13) . 

ZINC is a free database for virtual screen- 
ing that contains over 4.6 million compounds 
in ready-to-dock, 3D formats, available at the 
URL http://zinc.docking.org. Molecules in 
ZINC are annotated by molecular property 
that include molecular weight, number of 
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rotatable bonds, calculated LogP, number of 
hydrogen-bond donors, hydrogen-bond ac- 
ceptors, chiral centers, chiral double bonds 
(E/Z isomerism), polar and a polar desolva- 
tion energy (in kcal/mol), net charge and rigid 
fragments. The database contains 494,915 Lip- 
inski compliant molecules and 202,134 'lead- 
like' molecules, having molecular weight in 
the range 150 to 350 with calculated LogP <4, 
number of hydrogen-bond donors <3 three, 
and number of hydrogen-bond acceptors <6. 
ZINC provides several search criteria such as 
molecular property constraint, ZINC codes, 
vendor based, and molecular substructure 
search (14) . 

Docking and scoring 

The sorted ligands again were docked using 
GOLD and their interactions were studied. 
The protein was loaded into GOLD wizard, 
tailed by addition of hydrogens, then selection 
of binding site residues, choosing score 
(GOLD), and then the program. It is the first 
algorithm to be evaluated on a large dataset of 
complexes (15) . The result gives an empirical 
free energy scoring function that estimates the 
free energy of binding of molecules. 

The Chemscore function was implemented 
as a scoring function for the protein-ligand 
docking program GOLD, and its performance 
compared to the original Goldscore function 
in terms of docking accuracy, prediction of 
binding affinities, and speed. In terms of pro- 
ducing binding energy estimates, the Gold- 
score function appears to perform better than 
the Chemscore function and the two con- 
sensus protocols, particularly for faster search 
settings. Even at docking speeds of around 1- 
2 mm/compound, the Goldscore function pre- 
dicts binding energies with a standard devi- 
ation of —10.5 kJ/mol (16) . Based on the inter- 
action studies and the Lipinski's rule of five, 
four molecules were selected as the hit mo- 
lecules for further analysis in the wetlab. 
ChemDraw is the software used to sketch and 
build the molecular models. 

Results 

Query sequence search using PSI-BLAST 



Utilizing PSI-BLAST in the National Cen- 
ter for Biotechnology Information (NCBI), 
we searched the sequences based on the query 
(UniprotKB ID: Q9H3M7) with which we 
generated the five templates for Thioredoxin 
which are shown in (Figure 1). PSI-BLAST 
program increased the sensitivity of the 
BLAST search and helped us to generate the 
possible alignment between the thioredoxin 
query sequences and database sequences. It 
also constructed a position-specific substitu- 
tion matrix for the query sequence during the 
search. The obvious advantage of this ap- 
proach was that we could obtain more ac- 
curate statistical estimates, along with the ap- 
plication of BLAST/PSI-BLAST in terms of 
speed of aligning the sequences. This gave us 
five corresponding PDB hits and alignments. 
Furthermore, by using the Multiple Sequence 
Alignment Program we determined the con- 
served positions which are shown in figures 
1A and IB. 

Construction of 3-D models 

We constructed 3-D models of these tem- 
plates utilizing the I-TASSER web server. 
The target sequences were first threaded 
through a representative PDB structure library 
to select the best possible models from PDB, 
based on the templates. Using I-TASSER we 
obtained the scoring values as C-score, TM- 
score (0.08) as well as RMSD (2 A) values. 
Based on these models we could find the 
binding site residues. By aligning the tem- 
plates and refining the iterative threading as- 
sembly (I-TASSER), Web server was used to 
integrate the platform for automated protein 
structure and function prediction. And we 
could generate five complete models with 
Combining confidence score (C-score) and 
protein length. 

The algorithm of I-TASSER 

The advantage of using I-TASSER is that it 
first generates three-dimensional (3D) atomic 
models from the amino acid sequences, and 
then it does multiple threading alignments and 
iterative structural assembly simulations. 
Hence, the function of the protein could be in- 
ferred by structurally matching the 3D models 
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>P1:1G4MA 

TRVFKKAGKLTVYLGKRDFVDHIDEPVDGVVLVD-PEYLKER — 

RVYVTLTCAFRYGREDLDVLGLTFRKDLFVANVQSFPPAPEDKKPLKLGEHAYPFTFEIPPN- 
LPCSVTGKACGVDYEVKAFCENLEEKIHKRNSVRLVIRKVQYAPERPGPQPTAETTRQF— 

LMSDKPLHLEASLDKEIYYHGEPISVNVHVTNNTNKTVKKIKISVRQYADICLFNTAQYKCPVAMEEADDTVAPSSTFCK 
V YTLTPFL AS STLLREG All V 

SYKVKVKLVVSRGSDVAVELPFTLM HPKPKDDDIVFEDFAR 

>P1:2WTRS 

TRVFKKAGKLTVYLGKRDFVDHIDEPVDGVVLVD-PEYLKERRVYVTLTCAFRYG — 

VGLTFRKDLFVANVQSFPPAPEDKKPLTERLIKKLGEHAYPFTFELPPN-LPCSVTLKACGVDYEVKAFCAENLEKIHKRN 
SVRLVIRKVQYA— 

PERPQPTAETTRQFLMSDKPLHLEASLDKEIYYHGEPISVNVHVTNNTNKTVKKIKISVRQYADICLFNTAQYKCPVAMEE 
ADDTVAPSSTFCKVYTLT 

PTNLASSTLEILGIIVSYKVKVKLVVSRSSDVAVELPFTLMHPKPKEEPLDTNDDDIVFEDFARQR 
LKGMK 

>P1:2R51A 

SVEVEILLNDAESLFYDGETVSGKVSLSLKNPLEHQGIKIEFIGQIELYYDRGN — HHEFVSLVKDLA — R — PGEITQS-Q 

AFDFEFTHVEK-PYESYTGQNVKLRYFLRATISRRLN-DVVKE-DIVVHTLSTY — PELNSSIK-EVGIEDCLHIEFEYNKSKYHL 
KDVIVGKIYFLLVRIKIKH- 

EIDIIKRETTGTGPNVYHENDTIAKYE-DGAPVRGESIPIRLFLAGYELTPT-RDNKKFSVRYYLNLVLIDEEERRYFKQQEVVL 
G 

>PQ:1CF1C 

HVIFKKIKSVTIYLGKRDYIDHVEEPVDGVVLVD- 

PELVKGKRVYVSLTCAFRYGQEDIDVMGLSFRRDLYFSQVQVFASGATTRLQESLIKKLGATTRLQESLIKKLGANTYPFLLTF 
PDY-LPCSVM-KSCGVDFEIKAFATHEEDKIPKKSSVRLLIRKVQHAPRDMGPQPRAEASWQFFM — 
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Figure 1. Alignment of the templates (PDB hit) used to generate 3D model: 

a. Alignment shown in Clustal W Multiple Sequence Alignment (MSA) program 

b. Alignment shown in Clustal W Multiple Sequence Alignment (MSA) score table 



with other known proteins. The output from 
the webserver run was important because it 
contained the information of full-length sec- 
ondary and tertiary structure predictions, and 
functional annotations on ligand-binding sites, 
enzyme commission numbers and Gene Onto- 
logy terms. The accuracy of the predictions 
was based on the C-score of the modeling 
which depicted the best equivalent residues of 
two proteins based on the structural similarity 
and the output of TM-score. 



Search for the binding sites 

Our search for the binding sites in the mod- 
elled structure was done by using I-TASSER 
server with which we could locate the exact 
positions of various amino acid residues at 
their respective binding sites. We made sure 
that these residues were in the binding pocket 
within the vicinity of the active site of the 
modelled protein, as shown in the results 
below: 
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Figure 2. Showing the binding site view for I-TASSER model 

PHE:14 ASN:15 ASP:16 PRO:17 GLU:18 
VAL:20 VAL:146 ASP: 147 VAL:149 PRO 
:331 PRO:332 CYS:333 TYR:334 HIS:342 
ARG:343 LEU:344 GLU:345 SER:346 
TYR:366 GLU:369 PHE:370 MET:373 
PRO:374 PRO:376 TYR:378 THR:379 

The identification of putative ligand-bind- 
ing sites on proteins is important for the pre- 
diction of protein function. Knowledge-based 
approaches using structure databases have be- 
come interesting, because of the recent in- 
crease in structural information (Figure 2). 

Verification and validation of the model by PRO- 
CHECK, ramachandran plot, ERRAT value and 
verify-3D 

Verification of the built model was done to 
ensure whether the model was programmed 
correctly and the algorithms were implement- 
ed. Validation results determined that the dis- 
tribution of amino acid residues were at the 
most favourable region in the Ramachandran 
plot (more than 90%). This is an indication of 
the stereochemical quality of the model taken 
for the structural analysis, it and also valid- 
ated the target-ligand binding efficacy of the 
structure. Ramachandran plot displays the 
main chain torsion angles phi, psi (jp 9 *F); 



(Ramachandran angles) in a protein of known 
structure (Figure 3). 

Dihedral angle checks Ramachandran plot 
shows phi-psi distribution. Each residue is 
classified according to its region: f core f , 'al- 
lowed 1 , 'generous 1 , or 'disallowed'. Residues in 
the generous and disallowed regions are high- 
lighted on the plot. A log-odds score shows 
how normal or unusual the residue's location 
is on the Ramachandran plot for the given re- 
sidue type. Procheck results gave us the value 
of 96.3 % residues in most favoured regions 
in R-Plot which suggests that they predict 
thioredoxin model of good quality (Figure 
4A). 

ERRAT is another program which we used 
for verifying crystal structures. Error values in 
this program are plotted as a function of the 
position of a sliding residue in the window. 
The error function is based on the statistics of 
non bonded atom-atom interactions in the mo- 
deled structure. ERRAT prompts the models 
to have the overall quality factor to be above 
95%. And our results have shown that the 
value of overall quality factor was 95.506%. 
This confirmed that our developed model had 
reliable high resolution and quality compared 
to a database structures considered for the 
study (Figure 4B). 

Further we analyzed the compatibility of 
our predicted (3D) model by utilizing Verify- 
3D program with its own amino acid se- 
quence (ID). For each residue of the amino 
acid the scores of a sliding 21 -residue window 
(from -10 to +10) were added and plotted. 
The returned 3D- ID profile showed 3D- ID 
score of above 0.2 and consisted of 92.33%) of 
the residues in the predicted model (Figure 
4C). 





126 



Figure 3. Secondary structure and Ramachandran plot view of the model by modeler9v8 
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PROCHECK 

Ramachandran Plot trx-prochcck 




-45 0 45 
Phi (degrees) 
Plot statistics 



Residues in most favoured regions [A, B, LJ 


52 


96.3% 


Residues in additional allowed regions [a, b, I, p] 


2 


3.7% 


Residues in generously allowed regions [-a, -b, -I, -p] 


0 


0.0% 


Residues in disallowed regions 


0 


0.0% 


Number of non-glycine and non-proline residues 


54 


100.0% 


Number of end-residurs (excl gly and Pro) 


228 




Number of glycine residues (shown as triangles) 


2 




Number of proline residues 


n_ 




Total number of residues 


302 





trx-Procheck 01. PS 



of resolution of at least 2.0 .4°and 
R-factor no greater than 20% a good quality model would be expected to 
have over 90% in the most favoured regions. 



g Program: ERRAT2 
Chain #:1 



Overall quality factor :95.506 




100 120 140 160 180 200 220 240 260 280 300 
Residue $ (window center) 

*On the error axis, two lines are drawn to indicate the confidence with which it is possible 
to reject regions that exceed that error value. ** Expressed as the percentage of the protein 
for which the calculated error value falls below the 95% rejection limit. Good high 
resolution structures generally produce value aound 95% or higher. For lower resoulions 
(2.5 to 3A) the average overali quality factor is around 91% 



Verify-3D 



92.33% of the residues had an 
averaged 3D- ID score>0.2 



Figure 4. Model evaluation by SAVES server. A) Ramachandran plot procheck, B) ERRAT Program, C) Verify_3D 
program 



Statistical analysis 

Statistical analysis in Ramachandran plot 
compared well with the observed and expect- 
ed distributions of experimental observables 
and provided powerful tools for the quality 
control of our protein structure. The distribu- 
tion of backbone dihedral angles ('Ramachan- 
dran plot 1 ) have often been used for such qual- 
ity control, but without a firm statistical foun- 
dation. Hence the output for a protein struc- 
ture is a Ramachandran Z-score, expressing 
the quality of the Ramachandran plot relative 
to current state-of-the-art structures. 

Model optimization 

Loop optimization for generating our mo- 
del was done by using the software Modeller 
9V8 (17) . The model had initial potential 



energy=555506 10.692 and initial RMS gradi- 
ent^ 1 15700.680. For energy minimization of 
our model we used GROMACS program. 
This was done by using steepest descent algo- 
rithm for 1000 steps, and 5000 steps for con- 
jugate gradient algorithm we obtained its po- 
tential energy as-93 1094. 12296 and RMS gra- 
dient as 0.63272. The models are presented in 
(Figures 5 A and 5B). 

Results of docking studies 

We selected the ligands from PUBCHEM 
and ZINC data bases (almost 400), and by vir- 
tual screening using MERCURY and MAR- 
VIN VIEW we could shortlist 4 ligands as the 
best fit ligands. These are listed below and are 
presented in table 1 . 




Figure 5. Model generated after optimization using steepest descent (1000) and conjugate gradient(5000) algorithm 
using GROMACS 
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Table 1 . Molecule description 




3-hydroxy-2,3-diphenylbutanoic acid 

Source: PUBCHEM 

Molecular Weight: 256.29644 \g/moI\ 

Molecular Formula: Ci 6 H 16 0 3 

XLogP3-AA: 3.1 

H-Bond Donor: 2 

H-Bond Acceptor: 3 

Rotatable Bond Count: 4 

Exact Mass: 256.109944 

Monolsotopic Mass: 256.109944 

Topological Polar Surface Area: 57.5 

Heavy Atom Count: 19 




3-(hydroxyimino)-2,4-diphenylbutanenitrile 

Source: PUBCHEM 
Molecular Weight: 250.29516 [g/mol] 
Molecular Formula: Ci 6 H 14 N 2 0 
XLogP3-AA: 3.4 
H-Bond Donor: 1 
H-Bond Acceptor: 3 
Rotatable Bond Count: 4 
Tautomer Count: 2 
Exact Mass: 250.110613 
Monolsotopic Mass: 250.110613 
Topological Polar Surface Area: 56.4 
Heavy Atom Count: 19 




4-amino-3-pentadecylphenol 

Source: PUBCHEM 
Molecular Weight: 319.52458 [g/mol] 
Molecular Formula: C 2 iH 37 NO 
XLogP3-AA8: .7 
H-Bond Donor: 2 
H-Bond Acceptor: 2 
Rotatable Bond Count: 14 
Tautomer Count: 13 
Exact Mass: 319.287515 
Monolsotopic Mass: 319.287515 
Topological Polar Surface Area: 46.2 
Heavy Atom Count: 23 




2-ethyl-l,2-diphenylbutyl carbamate 

Source: PUBCHEM 
Molecular Weight: 297.39142 [g/mol] 
Molecular Formula: C19H23NO2 
XLogP3-AA: 4.6 
H-Bond Donor: 1 
H-Bond Acceptor: 2 
Rotatable Bond Count: 7 
Tautomer Count: 2 
Exact Mass: 297.172879 
Monolsotopic Mass: 297.172879 
Topological Polar Surface Area: 52.3 
Heavy Atom Count: 22 




(i) 3-hydroxy-2,3-diphenylbutanoic acid, 

(ii) 4-amino-3-pentadecylphenol, 

(iii) 3-(hydroxyimino)-2,4- 
diphenylbutanenitrile and 

(iv) 2-ethyl- 1 ,2-diphenylbutyl carbamate 
Using these four ligands Docking studies 

were performed using GOLD to evaluate the 
best docked ligand (Figure 6). 

Since the results depend on the choice of 
scoring functions obtained in GOLD (Table 
2), an analysis was performed based on the 
ligand binding score greater than 50. Also, 
there was no overlap between the top- scoring 
compounds from protein-ligand versus li- 



Figure 6. After docking the similar ligands, totally four 
ligands were shown to bind with gold score greater than 50. 
All the four ligands, were docked to minimized structure 
using GOLD and the best ligand with top scores interaction 
is shown below 
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Table 2. Chemscore based interactions of molecules docked into 
the active site of thioredoxin 



S.No Score value and molecule description 

1 

30.36 

8.26 
19.90 

0.00 
-5.26 

26.999 3-hydroxy-2,3-diphenylbutanoic acid 

2 

20.54 
0.00 
32.29 
0.00 
-23.86 

246.000 4-amino-3-pentadecylphenol 

3 

32.28 
0.27 
24.61 
0.00 
-1.83 

72.000 3-(hydroxyimino)-2,4-diphenylbutanenitrile 

4 

-6.17 
0.00 
18.39 
0.00 
-31.46 

41.000 2-ethyl- 1 ,2-diphenylbutyl carbamate 



gand-based scoring (Table 2). The small mo- 
lecules which we determined involved com- 
pounds with similar chemical structures, simi- 
lar modes of action, or drug interactions. 

Discussion 

The biological activity of the all four best 
fit predicted molecules is very poorly docu- 
mented in several databases including 'The 
Bibra Toxicity Profiles' which documents 
critical reviews on the most pertinent toxi- 
cological data published on commercially im- 
portant chemicals. Also, FDA and FDA poi- 
sonous plant database did not list these com- 
pounds. Only one compound has been listed 
in pubchem as anticancer drug (3-hydroxy- 
2,3-diphenylbutanoicacid). The molecule 3- 
hydroxy-2,3-diphenyl butanoic acid- was 
shown as anticancer drug in vivo model, [NCI] 
data in mice tumor model L1210 Leukemia 
(intraperitoneal) in B6D2F1 (BDF1) mice. 



Conclusion 

However, our studies on these molecules 
showed these compounds as good candidates 
for anticancer activities. Therefore, our ap- 
proach is valuable for drug discovery process 
and cancer therapy. Hence, now there is a 
need to study the pharmacological activity of 
these compounds in mice or in vivo models. 
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